Search CORE

162 research outputs found

ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

Author: De Gonzalo Simon Garcia
Matsumura Kazuaki
Peña Antonio J.
Publication venue
Publication date: 22/06/2023
Field of study

Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require repetitive implementation to perform similar analyses due to the lack of cooperation. To address this issue, modern optimization techniques, such as equality saturation, allow for exhaustive term rewriting at various levels of inputs, thereby simplifying compiler design. In this paper, we propose equality saturation to optimize sequential codes utilized in directive-based programming for GPUs. Our approach simultaneously realizes less computation, less memory access, and high memory throughput. Our fully-automated framework constructs single-assignment forms from inputs to be entirely rewritten while keeping dependencies and extracts optimal cases. Through practical benchmarks, we demonstrate a significant performance improvement on several compilers. Furthermore, we highlight the advantages of computational reordering and emphasize the significance of memory-access order for modern GPUs

arXiv.org e-Print Archive

JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization

Author: De Gonzalo Simon Garcia
Matsumura Kazuaki
Peña Antonio J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

The rapid development in computing technology has paved the way for directive-based programming models towards a principal role in maintaining software portability of performance-critical applications. Efforts on such models involve a least engineering cost for enabling computational acceleration on multiple architectures while programmers are only required to add meta information upon sequential code. Optimizations for obtaining the best possible efficiency, however, are often challenging. The insertions of directives by the programmer can lead to side-effects that limit the available compiler optimization possible, which could result in performance degradation. This is exacerbated when targeting multi-GPU systems, as pragmas do not automatically adapt to such systems, and require expensive and time consuming code adjustment by programmers. This paper introduces JACC, an OpenACC runtime framework which enables the dynamic extension of OpenACC programs by serving as a transparent layer between the program and the compiler. We add a versatile code-translation method for multi-device utilization by which manually-optimized applications can be distributed automatically while keeping original code structure and parallelism. We show in some cases nearly linear scaling on the part of kernel execution with the NVIDIA V100 GPUs. While adaptively using multi-GPUs, the resulting performance improvements amortize the latency of GPU-to-GPU communications.Comment: Extended version of a paper to appear in: Proceedings of the 28th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), December 17-18, 202

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

A symbolic emulator for shuffle synthesis on the NVIDIA PTX code

Author: García de Gonzalo Simón
Matsumura Kazuaki
Peña Monferrer Antonio José
Publication venue: Association for Computing Machinery (ACM)
Publication date: 01/01/2023
Field of study

Various kinds of applications take advantage of GPUs through automation tools that attempt to automatically exploit the available performance of the GPU's parallel architecture. Directive-based programming models, such as OpenACC, are one such method that easily enables parallel computing by just adhering code annotations to code loops. Such abstract models, however, often prevent programmers from making additional low-level optimizations to take advantage of the advanced architectural features of GPUs because the actual generated computation is hidden from the application developer. This paper describes and implements a novel flexible optimization technique that operates by inserting a code emulator phase to the tail-end of the compilation pipeline. Our tool emulates the generated code using symbolic analysis by substituting dynamic information and thus allowing for further low-level code optimizations to be applied. We implement our tool to support both CUDA and OpenACC directives as the frontend of the compilation pipeline, thus enabling low-level GPU optimizations for OpenACC that were not previously possible. We demonstrate the capabilities of our tool by automating warp-level shuffle instructions that are difficult to use by even advanced GPU programmers. Lastly, evaluating our tool with a benchmark suite and complex application code, we provide a detailed study to assess the benefits of shuffle instructions across four generations of GPU architectures.We are funded by the EPEEC project from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 801051 and the Ministerio de Ciencia e Innovación-Agencia Estatal de Investigación (PID2019-107255GB-C21/AEI/10.13039/501100011033). This work has been partially carried out on the ACME cluster owned by CIEMAT and funded by the Spanish Ministry of Economy and Competitiveness project CODEC-OSE (RTI2018-096006-B-I00).Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Multi-GPU design and performance evaluation of homomorphic encryption on GPU clusters

Author: Al Badawi Ahmad
Lin Jie
Matsumura Kazuaki
Mi Mi Aung Khin
Nan Xiao
Veeravalli Bharadwaj
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2021
Field of study

We present a multi-GPU design, implementation and performance evaluation of the Halevi-Polyakov-Shoup (HPS) variant of the Fan-Vercauteren (FV) levelled Fully Homomorphic Encryption (FHE) scheme. Our design follows a data parallelism approach and uses partitioning methods to distribute the workload in FV primitives evenly across available GPUs. The design is put to address space and runtime requirements of FHE computations. It is also suitable for distributed-memory architectures, and includes efficient GPU-to-GPU data exchange protocols. Moreover, it is user-friendly as user intervention is not required for task decomposition, scheduling or load balancing. We implement and evaluate the performance of our design on two homogeneous and heterogeneous NVIDIA GPU clusters: K80, and a customized P100. We also provide a comparison with a recent shared-memory-based multi-core CPU implementation using two homomorphic circuits as workloads: vector addition and multiplication. Moreover, we use our multi-GPU Levelled-FHE to implement the inference circuit of two Convolutional Neural Networks (CNNs) to perform homomorphically image classification on encrypted images from the MNIST and CIFAR - 10 datasets. Our implementation provides 1 to 3 orders of magnitude speedup compared with the CPU implementation on vector operations. In terms of scalability, our design shows reasonable scalability curves when the GPUs are fully connected.This work is supported by A*STAR under its RIE2020 Advanced Manufacturing and Engineering (AME) Programmtic Programme (Award A19E3b0099).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Cation-Disordered Li3VO4: Reversible Li Insertion/Deinsertion Mechanism for Quasi Li-Rich Layered Li1+x[V1/2Li1/2]O2 (x = 0–1)

Author: Baba Kazuhisa
Iwama Etsuro
Kisu Kazuaki
Matsumura Keisuke
Miyamoto Junichi
Naoi Katsuhiko
Naoi Wako
Nishio Nagare
Orikasa Yuki
Rozier Patrick
Simon Patrice
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2018
Field of study

The reversible lithiation/delithiation mechanism of the cation-disordered Li3VO4 material was elucidated, including the understanding of structural and electrochemical signature changes during cycling. The initial exchange of two Li induces a progressive and irreversible migration of Li and V ions from tetrahedral to octahedral sites, confirmed by the combination of in situ/operando X-ray diffraction and X-ray absorption fine structure analyses. The resulting cation-disordered Li3VO4 can smoothly and reversibly accommodate two Li and shows a Li+ diffusion coefficient larger by 2 orders of magnitude than the one of pristine Li3VO4, leading to improved electrochemical performance. This cation-disordered Li3VO4 negative electrode offers new opportunities for designing high-energy and high-power supercapacitors. Furthermore, it opens new paths for preparing disordered compounds with the general hexagonal close-packing structure, including most polyanionic compounds, whose electrochemical performance can be easily improved by simple cation mixing

Crossref

Hal - Université Grenoble Alpes

HAL AMU

Open Archive Toulouse Archive Ouverte

Hal-Diderot

学生間のrole-play法による患者教育演習の臨地実習における学習効果に関する質的研究(平成16年度テーマ別研究抄録)

Author: Fumiyo SUGINO
Kazuaki FUKUDA
Michiko MATSUMURA
Yoshie IKUSHIMA
Yuki IWAKIRI
岩切由紀
杉野文代
松村三千子
生島祥江
福田和明
Publication venue
Publication date: 31/03/2006
Field of study

Kobe Tokiwa University　Institutional　Repository

Effect of Heart Failure on Long‐Term Clinical Outcomes After Percutaneous Coronary Intervention Versus Coronary Artery Bypass Grafting in Patients With Severe Coronary Artery Disease

Author: Ando Kenji
Aoyama Takeshi
Domei Takenori
Ehara Natsuhiko
Furukawa Yutaka
Imada Kazuaki
Inada Tsukasa
Iwakura Atsushi
Kadota Kazushige
Kaneda Kazuhisa
Kanemitsu Naoki
Kato Eri
Kimura Takeshi
Komiya Tatsuhiko
Koyama Tadaaki
Matsuda Mitsuo
Matsumura-Nakano Yukiko
Minatoya Kenji
Morimoto Takeshi
Nakagawa Yoshihisa
Natsuaki Masahiro
Nawada Ryuzo
Sato Yukihito
Shinoda Eiji
Shiomi Hiroki
Soga Yoshiharu
Suwa Satoru
Tada Tomohisa
Takeji Yasuaki
Tamura Toshihiro
Taniguchi Ryoji
Toyofuku Mamoru
Yamaji Kyohei
Yamamoto Ko
Yamazaki Fumio
Yoshikawa Yusuke
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 03/08/2021
Field of study

[Background] Heart failure might be an important determinant in choosing coronary revascularization modalities. There was no previous study evaluating the effect of heart failure on long‐term clinical outcomes after percutaneous coronary intervention (PCI) relative to coronary artery bypass grafting (CABG). [Methods and Results] Among 14 867 consecutive patients undergoing first coronary revascularization with PCI or isolated CABG between January 2011 and December 2013 in the CREDO‐Kyoto PCI/CABG registry Cohort‐3, we identified the current study population of 3380 patients with three‐vessel or left main coronary artery disease, and compared clinical outcomes between PCI and CABG stratified by the subgroup based on the status of heart failure. There were 827 patients with heart failure (PCI: N=511, and CABG: N=316), and 2553 patients without heart failure (PCI: N=1619, and CABG: N=934). In patients with heart failure, the PCI group compared with the CABG group more often had advanced age, severe frailty, acute and severe heart failure, and elevated inflammatory markers. During a median 5.9 years of follow‐up, there was a significant interaction between heart failure and the mortality risk of PCI relative to CABG (interaction P=0.009), with excess mortality risk of PCI relative to CABG in patients with heart failure (HR, 1.75; 95% CI, 1.28–2.42; P<0.001) and no excess mortality risk in patients without heart failure (HR, 1.04; 95% CI, 0.80–1.34; P=0.77). [Conclusions] There was a significant interaction between heart failure and the mortality risk of PCI relative to CABG with excess risk in patients with heart failure and neutral risk in patients without heart failure

Kyoto University Research Information Repository

Percutaneous coronary intervention using new-generation drug-eluting stents versus coronary arterial bypass grafting in stable patients with multi-vessel coronary artery disease: From the CREDO-Kyoto PCI/CABG registry Cohort-3

Author: Ando Kenji
Ehara Natsuhiko
Eizawa Hiroshi
Esaki Jiro
Fuki Masayuki
Furukawa Yutaka
Hanyu Michiya
Imada Kazuaki
Inoko Moriaki
Ishii Katsuhisa
Kadota Kazushige
Kanemitsu Naoki
Kato Eri
Kimura Takeshi
Komiya Tatsuhiko
Mabuchi Hiroshi
Marui Akira
Matsumura Yukiko
Miki Shinji
Minatoya Kenji
Morimoto Takeshi
Nakagawa Yoshihisa
Nakatsuma Kenji
Nakayama Shogo
Natsuaki Masahiro
Nishikawa Ryusuke
Nishizawa Junichiro
Ogawa Tatsuya
Ohno Nobuhisa
Onodera Tomoya
Sakai Hiroshi
Sakamoto Hiroki
Shiomi Hiroki
Shirotani Manabu
Soga Yoshiharu
Suwa Satoru
Tada Takeshi
Tada Tomohisa
Takahashi Mamoru
Takeda Teruki
Takeji Yasuaki
Tambara Keiichi
Tamura Nobushige
Tamura Takashi
Tamura Toshihiro
Tanaka Masaru
Taniguchi Ryoji
Terai Yasuhiko
Tsuneyoshi Hiroshi
Uegaito Takashi
Watanabe Hiroki
Yaku Hidenori
Yamada Miho
Yamaji Kyohei
Yamamoto Erika
Yamamoto Ko
Yamamoto Takashi
Yamashita Yugo
Yamazaki Kazuhiro
Yoshikawa Yusuke
Publication venue: Public Library of Science (PLoS)
Publication date: 01/09/2022
Field of study

AIMS: There is a scarcity of studies comparing percutaneous coronary intervention (PCI) using new-generation drug-eluting stents (DES) with coronary artery bypass grafting (CABG) in patients with multi-vessel coronary artery disease. METHODS AND RESULTS: The CREDO-Kyoto PCI/CABG registry Cohort-3 enrolled 14927 consecutive patients who underwent first coronary revascularization with PCI or isolated CABG between January 2011 and December 2013. The current study population consisted of 2464 patients who underwent multi-vessel coronary revascularization including revascularization of left anterior descending coronary artery (LAD) either with PCI using new-generation DES (N = 1565), or with CABG (N = 899). Patients in the PCI group were older and more often had severe frailty, but had less complex coronary anatomy, and less complete revascularization than those in the CABG group. Cumulative 5-year incidence of a composite of all-cause death, myocardial infarction or stroke was not significantly different between the 2 groups (25.0% versus 21.5%, P = 0.15). However, after adjusting confounders, the excess risk of PCI relative to CABG turned to be significant for the composite endpoint (HR 1.27, 95%CI 1.04-1.55, P = 0.02). PCI as compared with CABG was associated with comparable adjusted risk for all-cause death (HR 1.22, 95%CI 0.96-1.55, P = 0.11), and stroke (HR 1.17, 95%CI 0.79-1.73, P = 0.44), but with excess adjusted risk for myocardial infarction (HR 1.58, 95%CI 1.05-2.39, P = 0.03), and any coronary revascularization (HR 2.66, 95%CI 2.06-3.43, P<0.0001). CONCLUSIONS: In this observational study, PCI with new-generation DES as compared with CABG was associated with excess long-term risk for major cardiovascular events in patients who underwent multi-vessel coronary revascularization including LAD

PubMed Central

Kyoto University Research Information Repository

ジンコウシコンザイリョウトシテノエチレン - ビニルアルコールキョウジュウゴウタイ

Author: Matsumura Kazuaki
Publication venue: 京都大学
Publication date: 24/09/2004
Field of study

京都大学0048新制・課程博士博士(工学)甲第11144号工博第2423号新制||工||1323(附属図書館)UT51-2004-R19京都大学大学院工学研究科高分子化学専攻(主査)教授岩田博夫, 教授堤定美, 教授田畑泰彦学位規則第4条第1項該当Doctor of EngineeringKyoto UniversityDFA

Kyoto University Research Information Repository